30 research outputs found
Characterizing Mechanisms for Factual Recall in Language Models
Language Models (LMs) often must integrate facts they memorized in
pretraining with new information that appears in a given context. These two
sources can disagree, causing competition within the model, and it is unclear
how an LM will resolve the conflict. On a dataset that queries for knowledge of
world capitals, we investigate both distributional and mechanistic determinants
of LM behavior in such situations. Specifically, we measure the proportion of
the time an LM will use a counterfactual prefix (e.g., "The capital of Poland
is London") to overwrite what it learned in pretraining ("Warsaw"). On Pythia
and GPT2, the training frequency of both the query country ("Poland") and the
in-context city ("London") highly affect the models' likelihood of using the
counterfactual. We then use head attribution to identify individual attention
heads that either promote the memorized answer or the in-context answer in the
logits. By scaling up or down the value vector of these heads, we can control
the likelihood of using the in-context answer on new data. This method can
increase the rate of generating the in-context answer to 88\% of the time
simply by scaling a single head at runtime. Our work contributes to a body of
evidence showing that we can often localize model behaviors to specific
components and provides a proof of concept for how future methods might control
model behavior dynamically at runtime
Are Language Models Worse than Humans at Following Prompts? It's Complicated
Prompts have been the center of progress in advancing language models'
zero-shot and few-shot performance. However, recent work finds that models can
perform surprisingly well when given intentionally irrelevant or misleading
prompts. Such results may be interpreted as evidence that model behavior is not
"human like". In this study, we challenge a central assumption in such work:
that humans would perform badly when given pathological instructions. We find
that humans are able to reliably ignore irrelevant instructions and thus, like
models, perform well on the underlying task despite an apparent lack of signal
regarding the task they are being asked to do. However, when given deliberately
misleading instructions, humans follow the instructions faithfully, whereas
models do not. Our findings caution that future research should not idealize
human behaviors as a monolith and should not train or evaluate models to mimic
assumptions about these behaviors without first validating humans' behaviors
empirically.Comment: EMNLP 202
Does CLIP Bind Concepts? Probing Compositionality in Large Image Models
Large-scale neural network models combining text and images have made
incredible progress in recent years. However, it remains an open question to
what extent such models encode compositional representations of the concepts
over which they operate, such as correctly identifying ''red cube'' by
reasoning over the constituents ''red'' and ''cube''. In this work, we focus on
the ability of a large pretrained vision and language model (CLIP) to encode
compositional concepts and to bind variables in a structure-sensitive way
(e.g., differentiating ''cube behind sphere'' from ''sphere behind cube''). In
order to inspect the performance of CLIP, we compare several architectures from
research on compositional distributional semantics models (CDSMs), a line of
research that attempts to implement traditional compositional linguistic
structures within embedding spaces. We find that CLIP can compose concepts in a
single-object setting, but in situations where concept binding is needed,
performance drops dramatically. At the same time, CDSMs also perform poorly,
with best performance at chance level
Adverse drug events in Chinese elder inpatients: a retrospective review for evaluating the efficiency of the Global Trigger Tool
BackgroundElderly patients frequently experience a high incidence of adverse drug events (ADEs) due to the coexistence of multiple diseases, the combination of various medications, poor medication compliance, and other factors. Global Trigger Tool (GTT) is a new method for identifying ADEs, introducing the concept of a trigger, that is, clues including abnormal laboratory values, reversal drugs, and clinical symptoms that may suggest ADEs, and specifically locating information related to ADEs in the medical record to identify ADEs. The aim of this study was to establish a GTT-based trigger tool for adverse medication events in elderly patients and to investigate the risk variables associated with such events.MethodsThe triggers were identified by reviewing the frequency of ADEs in elderly patients in Sichuan, China, retrieving relevant literature, and consulting experts. A retrospective analysis was carried out to identify adverse medication occurrences among 480 elderly inpatients in Sichuan People’s Hospital.ResultsA total of 56 ADEs were detected in 51 patients (10.62%), 13.04 per 1,000 patient days, and 11.67 per 100 admissions. The overall positive predictive value (PPV) of the triggers was 23.84, and 94.64% of ADEs caused temporary injury. Gastrointestinal system injury (27.87%) and metabolic and nutritional disorders (24.53%) were the primary organ-systems affected by ADEs. The majority of ADEs were caused by drugs used to treat cardiovascular diseases. 71.43% of ADE occurred within 2 days of administration and the risk factor analysis of ADE revealed that the number of medicines had a significant correlation.ConclusionThis study demonstrated GTT’s value as a tool for ADEs detection in elderly inpatients in China. It enhances the level of medication management and comprehensively reflects the situation of ADE of the elderly
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
Fuzzy Evaluation of Crowd Safety Based on Pedestrians’ Number and Distribution Entropy
Crowd video monitoring and analysis is a hot topic in computer vision and public management. The pre-evaluation of crowd safety is beneficial to the prediction of crowd status to avoid the occurrence of catastrophic events. This paper proposes a method to evaluate crowd safety based on fuzzy inference. Pedestrian’s number and distribution uniformity are considered in a fuzzy inference system as two kinds of attributes of a crowd. Firstly, the pedestrian’s number is estimated by the number of foreground pixels. Then, the distribution uniformity of a crowd is calculated using distribution entropy by dividing the monitoring scene into several small areas. Furthermore, through the fuzzy operation, the fuzzy system is constructed by using two input variables (pedestrian’s number and distribution entropy) and one output variable (crowd safety status). Finally, inference rules between the crowd safety state and the pedestrian’s number and distribution uniformity are constructed to obtain the pre-evaluation of the safety state of the crowd. Three video sequences extracted from different scenes are used in the experiment. Experimental results show that the proposed method can be used to evaluate the safety status of the crowd in a monitoring scene
Detection of Shoot Beetle Stress on Yunnan Pine Forest Using a Coupled LIBERTY2-INFORM Simulation
Yunnan pine shoot beetles (PSB), Tomicus yunnanensis and Tomicus minor have spread through southwestern China in the last five years, leading to millions of hectares of forest being damaged. Thus, there is an urgent need to develop an effective approach for accurate early warning and damage assessment of PSB outbreaks. Remote sensing is one of the most efficient methods for this purpose. Despite many studies existing on the mountain pine beetle (MPB), very little work has been undertaken on assessing PSB stress using remote sensing. The objective of this paper was to develop a spectral linear mixing model aided by radiative transfer (RT) and a new Yellow Index (YI) to simulate the reflectance of heterogeneous canopies containing damaged needles and quantitatively inverse their PSB stress. The YI, the fraction of dead needles, is a physically-explicit stress indicator that represents the plot shoots damage ratio (plot SDR). The major steps of this methods include: (1) LIBERTY2 was developed to simulate the reflectance of damaged needles using YI to linearly mix the green needle spectra with the dead needle spectra; (2) LIBERTY2 was coupled with the INFORM model to scale the needle spectra to the canopy scale; and (3) a look-up table (LUT) was created against Sentinel 2 (S2) imagery and inversed leaf chlorophyll content (LCC), green leaf area index (LAI) and plot SDR. The results show that (1) LIBERTY2 effectively simulated the reflectance spectral values on infested needles (mean relative error (MRE) = 1.4–18%), and the YI can indicate the degrees of needles damage; (2) the coupled LIBERTY2-INFORM model is suitable to estimate LAI (R2 = 0.73, RMSE = 0.17 m m−2, NRMSE = 11.41% and the index of agreement (IOA) = 0.92) and LCC (R2 = 0.49, RMSE = 56.24 mg m−2, NRMSE = 25.22% and IOA = 0.72), and is better than the original LIBERTY model (LAI: R2 = 0.38, RMSE = 0.43 m m−2, NRMSE = 28.85% and IOA = 0.68; LCC: R2 = 0.34, RMSE = 76.44 mg m−2, NRMSE = 34.23% and IOA = 0.57); and (3) the inversed YI is positively correlated with the measured plot SDR (R2 = 0.40, RMSE = 0.15). We conclude that the LIBERTY2 model improved the reflectance simulation accuracy of both the needles and canopies, making it suitable for assessing PSB stress. The YI has the potential to assess PSB damage
A Low-Power ADPLL with Calibration-Free RO-Based Injection-Locking TDC for BLE Applications
This paper proposes a low-power all-digital phase-locked loop (ADPLL) with calibration-free ring oscillator (RO)-based injection-locking time to digital converter (TDC) for BLE applications. The RO is reused as the delay cell of TDC, and the quantization step of TDC is always tracked with the RO period; hence no calibration is needed in this architecture. We adopt RO tuning to lower the injection-locking bandwidth so as to decrease the power consumption of the injection current. Moreover, the fractional part of phase error detection is turned down in the coarse tuning of ADPLL to save power. An LC-based digital-controlled oscillator (LCDCO) with a 6.4 nH inductor and a resistive bias is used to have a low power and better phase noise performance. The ADPLL is fabricated in 40 nm CMOS with a 1 V supply and consumes 1.4 mW when it is locked. The measured phase noise is −114 dBc/Hz at 1 MHz offset. The test results show significant power saving. Thus, it can be a promising candidate for BLE applications